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Abstract 

It is known that simulation of the mean position of a Reflected Random Walk (RRW) {W n } 
exhibits non-standard behavior, even for light-tailed increment distributions with negative 
drift. The Large Deviation Principle (LDP) holds for deviations below the mean, but for 
deviations at the usual speed above the mean the rate function is null. This paper takes a 
deeper look at this phenomenon. Conditional on a large sample mean, a complete sample path 
LDP analysis is obtained. Let / denote the rate function for the one dimensional increment 
process. If / is coercive, then given a large simulated mean position, under general conditions 
our results imply that the most likely asymptotic behavior, if)*, of the paths n~ 1 W^ tn ^ is to 
be zero apart from on an interval [To,Ti] C [0, 1] and to satisfy the functional equation 

V/ H^*(t)) = A*(Ti - t) whenever ^(t) ^ 0. 

If I is non-coercive, a similar, but slightly more involved, result holds. 

These results prove, in broad generality, that Monte Carlo estimates of the steady-state mean 
position of a RRW have a high likelihood of over-estimation. This has serious implications 
for the performance evaluation of queueing systems by simulation techniques where steady 
state expected queue-length and waiting time are key performance metrics. The results show 
that naive estimates of these quantities from simulation are highly likely to be conservative. 
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1. Introduction 



Consider W = {W n , n > 0}, a random walk that starts at zero, is reflected at the origin, and 
has increments process X = {X n ,n > 0}. The Reflected Random Walk (RRW) is governed 
by Lindley's recursion [25| . 



o •= 



and W, 



n+l 



[W n + X n } + for n > 1. 



(1) 



This recursion plays a fundamental role in queueing systems and has long been an important 
object of study in evaluating their performance. With X being the difference between service 
times and inter-arrival times of customers, the RRW W describes the evolution of waiting 
times at a single server first-come first-served queue with infinite waiting space. Lindley's 
recursion also governs the evolution of the queue-length at certain single server queues, such 
as the M/M/l queue 0. 

Since the 1980s, large deviation techniques have been brought to bear on the analysis of equa- 
tion (pQ) and the distribution of an element of its stationary solution, which exists whenever 
X is stationary [2^]. For example, using a one-dimensional large deviations approach it has 
been established in broad gene rality that the stationary distribution possesses logarithmic 
asymptotics, see 20] [12] [13 ]_ 15] 23| and references therein. This fact is exploited in the the- 



ory effective bandwidths [22J and in the development of on-the-fly estimation schemes from 



observations of the queueing behavior for the determination of quality of service performance 
metrics 11] 3] 21] 36] 14] 30]. Moreover, through the use of functional large deviation tech- 
niques, assuming X is i.i.d., the seminal paper [l] proved the significant, broadly applicable 
result that the most likely path to a large value of the transient RRW is piece-wise linear. 
This deduction has since been extended (e.g. 23]), including the establishment of results in 
the stationary regime (e.g. 10][19][1^|). All of these papers report piece- wise linear most 
likely paths to a large value of the RRW or an element of its stationary solution. 

The exclusive focus of all of the research cited above is to garner understanding of likelihood of 
large values of the RRW either in the transient or stationary regime, and the determination of 
the most likely paths to these large values. In the present article we employ a functional large 
deviation approach to analyze the estimation of a fundamental quantity for the performance 
evaluation of a RRW that has so far been overlooked: its mean value. This study reveals 
substantially richer structure than the study of large values of the RRW, leading to non- 
convex rate functions and concave most likely paths. Despite this, perhaps surprisingly, 
general qualitative and quantitative deductions can still be made. 

The starting point of the present paper is the following qualitative result: It was observed 
recently that simulation of the mean position of a RRW, 



W r . 
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exhibits non-standard behavior, even for light-tailed increments with negative drift. For ex- 
ample, if X is i.i.d., then the probability that W n underestimates the long run expected value 
decays exponentially in n, but the probability of an over-estimate decays sub-exponentially. 
This is shown in the following proposition, which is taken fro m |32l| . Part (i) follows from 
Theorem 11.2.3 and part (ii) from Proposition 11.3.4 (see also |3l|). 

Proposition 1. Consider the RRW where X is i.i.d. with E[Xq] < and E[Xq] < oo. Then 
the Markov chain W has a unique invariant probability measure with finite steady-state mean 
W, and the simulated averages have the following properties. 

(i) The lower error-probability decays exponentially: For each r < W , 

limsup — log PjVFn < r} < 0. 

n— too Tl 

(ii) The upper error-probability decays sub- exponentially: For each r > W , 

lim -logP{PF n > r) = 0. 

n— >oo n 

This paper takes a deeper look at the latter phenomenon, providing a detailed understanding 
of why it is hard to simulate the mean position of a RRW and, therefore, why care must 
be taken drawing deductions regarding average queueing performance from the output of 
a simulation. We establish that, in broad generality, the process {n~ 2 Yli=i Wi} satisfies a 
Large Deviation Principle (LDP) with a non-trivial rate function. As a consequence, the 
likelihood that the sample-mean estimate of a RRW is an overestimate decays on a slower 
than exponential scale. The rate function in question is non-convex and this LDP could not, 
therefore, be established by asymptotic analysis of scaled cumulant generating functions, 
an approach commonly employed in queueing theory and used in the Gartner-Ellis method. 
Unlike the most likely paths of the RRW that lead to a large position which are piece-wise 
linear (l|(l7|. we ascertain that that the most likely paths associated with a large simulated 
mean possess more complex features: they are concave, with a possible discontinuity when 
the path first becomes deviant. A number of examples of these general results are presented 
to demonstrate the range of qualitative possibilities. 

The results contained in this article clearly indicate that significant statistical care must be 
taken when using estimates from simulation of the mean position of a RRW. This has serious 
implications for the performance evaluation of queueing systems by simulation techniques 
where steady state expected queue-length and waiting time are key performance metrics. 
Our results show, in broad generality, that the most natural estimation scheme, Monte Carlo 
estimates, of these expected values suffer a likelihood of over-estimation that is, approximately 
speaking, Weibull-like with shape parameter 1/2. Consequently, naive estimation of these 
quantities from simulation is likely to underestimate system performance. 
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As a concrete illustration of these general results, one example in which the rate function and 
most likely paths are explicitly computable can be found in the following proposition. 

Proposition 2. Consider the RRW in which the increments process X consists ofi.i.d Gaus- 
sian random variables with mean —5 < and variance a 2 . Then {n~ l Wn) satisfies the LDP 
in [0, oo) with rate function 

( ax rzi 

if ze [0,5/6], 

%^ = < IV/ D A\ 2 (2) 

if z G [5/6, oo). 

As n tends to infinity, the most likely paths of n^ 1 W\ nt \ leading to W n > nz, which we denote 
ip* are as follows. 




(i) Ifze (0, 5/6], then for any T G [0, 1 - y/6z/d\ 



(0 ifte[0,T ]U[T + y/6z/S,l], 

[5(t - T ) - 5 J — (t - T ) 2 for t G [T ,T + y/te/8\. 

(ii) If z G [5/6, oo), then 

r(t)=3(z+~ S \ (t-^) -St forte [0,1]. 

We return to this example in Section 14.11 where the rate function in equation ([2]) and two 
most likely paths are illustrated in Figure HI 

Proposition [2] and other results that follow concern asymptotics of the doubly scaled sum 
n~ l W n = n~ 2 Y^d=i Wi- The n 2 scaling is similar to [1, Theorem 4.1], concerning asymptotics 
for the GI/G/1 queue in the light tailed setting. This result proves that the tail of the busy 
time distribution decays more slowly than exponentially: lim^—).^ n 1 logP(B>n 2 z) = 
—Ky/z for each z > and some K > 0, where B denotes the busy time in steady-state. 
The form of the limit can be predicted through scaling arguments: If {n~ 2 B} satisfies the 
LDP, it must do so with a rate function of the form Ky/z, as can be seen by considering the 
substitution m = n^J z/y. The rate function for the large deviations of the process {n _1 Ty n } 
is necessarily more complex and, in general, the rate function will diverge more rapidly than 
yfz as z — > oo. 

The rest of this paper is organized as follows. In Section [2] we prove in broad generality that 
the sample paths of the rescaled simulated mean n~ l W n satisfy a functional LDP. Using this 
LDP, in Section [3] we characterize properties of the most likely paths of the RRW given that 
the rescaled simulated mean is large. In Section H] we present examples including the RRW 
with i.i.d. Gaussian increments, the M/D/l queue, the D/M/l queue and the M/M/l queue, 
as these exhibit the full range of theoretically possible behaviors. 
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2. Functional LDP for the rescaled simulated mean position 

We assume that the reader is familiar with the basics of large deviation theory, such as the 
definition of the LDP and the statement of the Contraction Principle, as can be found in 
HASH]- The notation in this paper is as follows. Let C[0, 1] denote the set of continuous 
M-valued functions on [0, 1] equipped with the topology induced by the supremum norm, 
sup tg [ ,i] Let D[Q, 1] denote the space of M-valued cadlag functions on [0,1] 



equipped with the Skorohod (Ji) topology 39] 3] [40] induced by the following metric: for 
any two functions (f>,ip G D[0, 1], define 

d{4>^) := inf{max(||0oA-^||,||A-e||)}, 

where e is the identity (e(t) = t), and A is the set of strictly increasing functions A from [0, 1] 
to [0,1] that are continuous, with a continuous inverse. Finally, let £[0,1] C D[0, 1] denote 
the set of functions that have finite variation. Each (ft G L[0, 1] has a Lebesgue decomposition 
with respect to Lebesgue measure whose absolutely continuous part we denote <f>^ and 
whose singular component we denote 4>^ s \ so that 4>{t) = f£ <j)( a \s)ds + (f)( s \t). Furthermore, 
we decompose 4>^ into its positive ^ and negative cfi parts by the Hahn Decomposition 
Theorem. 

For each n G N and all t G [0,1], we define the following scaled sample paths: 

[ntj-l [n*J 
:= - E Xi ' w ^ '■= ~ W Vnt\ and w n (t) : = _£V i + —(nt - [nt\])W [nti+1 . 

i=0 1=1 

The first two of these are elements of -D[0, 1] and correspond to the paths for the simulated 
position of the unconstrained random walk and for the simulated position of the RRW, 
respectively. The sample path w n is an element of C[0, 1] and is the polygonally approximated 
continuous path for the rescaled simulated mean location of the RRW. In particular, note 
that w n (l) = rT x W n = rT 2 Yn=i i s the rescaled simulated mean of the RRW. 

For the general qualitative theorem we make the following assumption. 

Assumption 1. The sample paths for the unconstrained random walks {x n } satisfy the LDP 
in D[0,1] with good rate function Ix- 

This assumption is known to hold for a large collection of processes. For example, if X is 
an i.i.d. sequence, then define 9^ := sup{# > : E[exp(— 9Xq)] < oo_j_ and 9^ := sup{# > : 
E[exp(#Ao)] < oo}. If minl^, (9^} > 0, then by Cramer's Theorem [H]j3] the partial sums of 
{x n (l)} satisfy the LDP in R with the good, convex (local) rate function 



I(y) := sup (9y - log E[exp(0X o )]) , for y G M, 
e 



(3) 
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and Mogul'skii's Theorem 33]_proves that Assumption [T] holds true. The rate function is 
typically of the form (e.g QQHH): 

T( , fJoi(¥ a Ks))ds + e^(i) + e^(i) if 7 ez[o,i], 

I +00 otherwise, 

where 6»V(1) : = if # 4 = 00 and 7^(1) = 0, and 6>V(1) := if 6>t = 00 and 7^(1) = 0. 
Dembo and Zajic 0] have generalized Mogul'skii's Theorem to include sequences X that 
need not be i.i.d, but that satisfy a uniform super-exponential tail condition that ensures 
that the generalization of min(#^, 9^) is +00, as well as a mixing condition that encompasses, 
for example, Markov chains that are uniformly ergodic. The resulting rate function for 
these processes is also of the form in equation (j4]), but the cumulant generating function 
log E[exp(#Ao)] in equation ([3]) is replaced with the scaled cumulant generating function 
limn- 1 log E[exp(#(X + • • • + X„_i))]. 

Theorem 3. The following hold under Assumption^ 

(i) The sequence of rescaled paths of the simulated mean of the RRW {w n } satisfies the 
LDP in C[0, 1] with rate function 

%(</>) = inf \l x (rf): I sup ( 7 (t) - 7 (s)) dt = </>(t) for all t G [0, 1 

(ii) If Ix is °f the form in equation @, then I^r is only finite at those functions (j) such 
that <j) exists, (ft is non-negative and <fi is an element of L[0, 1], in which case 

= I ( / (^ (a) ( s )) 1 W( s )>o} + ^ f /(y) 1 W( s) =o } ) cfa + ^(l) + ^t(i). (5) 

Proof. The proof of the first assertion follows from the contraction principle (e.g. 17, The- 
orem 4.2.16]) after noting the following. The Skorohod map, /(7)(t) = 7 (i) — inf s < t 7(s), 
is continuous from D[0, 1] to D[0, 1] (e.g. 0, Theorem 13.5.1]) and f(x n ){t) = w n (t). The 
integration map, g(tp)(t) = J Q * ip(s) ds is continuous from D[0, 1] to C[0, 1] (e.g. [40 : , Theorem 
11.5.1]) and g(w n ){t) = w n {t). 

For the second assertion, if <j> is such that does not exist, takes negative values or is not 
an element of L[0, 1], then I^((f>) = +00, as can be seen from the contraction principle. If eft 
exists, is non-negative and is an element of L[0, 1], then 

= inf ^ ( / /( 7 (a) (s)) ds + + fV(l) : 

sup ( 7 (i) - 7(s)) = 0(i) for all f G [0, 1] 



" 76i[0,l] Uo 
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If <j)(t) > 0, then 7 must satisfy (a) (t) = 7 (a) (t). As cp^(t) = for almost all t such that 
4>{t) = 0, if <fi(t) = we are free to choose j( a \t) = inf y <o I(y) to minimize the rate function. 
The singular parts and ft must be mimicked by 7W. In order to minimize the rate 
function, 7W is unchanging everywhere else, leading to the result. □ 

The rate function in equation © can be understood as follows. In order to see the rescaled 
simulated mean sample path </>, in the integral one must locally pay for changes in the incre- 
ments process so long as the location is positive. If the location is zero, then the increments 
can take their most likely value less than or equal to zero. The singular parts of the location 
are matched by singular parts in the increments. 



3. Most likely RRW paths to a large simulated mean position 

Considering Theorem [3] in conjunction with the contraction principle and the projection 
4> i-> 0(1), roughly speaking, we can deduce that 

p (^*.-)^ (-»4„ : *w = «}) ■ 

Thus consider the following minimization problem: 

If Ix is of the form in equation dH), then this problem can be rewritten in terms of the fluid 
limit paths of the RRW: 

minimize J{ip) 

f 1 ( 6 ) 
subject to ip £ L + [0, 1] and / tp(s)ds = z, 

Jo 

where L + [0, 1] is the set of non-negative elements of L[0, 1] and the objective function is 

J(V) := J (/(^ (a) (^))l W ( s )>o } + inf/(y)l W(s)=0} ) ds + 9^(1) + 0^(1). (7) 

The evaluation of the optimization d5J) and the identification of properties of its infimal 
argument (or arguments) are the subject of the rest of this paper. That is, we wish to identify 
properties of the most likely fluid simulated RRW paths that give rise to the simulated 
mean position being unusually large. These optimizers are most likely paths in the sense 
that if G is any measurable neighborhood of the set of minimizing arguments to © and 
G = {</> : <p(t) = £ ds for some ip G G\i th.cn lini^—^oo P{w n e G} = 1. This follows, for 
example, by |24l . Theorem 2.2]. 

In addition to Assumption [TJ the following assumption is in force throughout the rest of this 
article. 
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Assumption 2. The rate function Ix is of the form in equation where I is a good, 
convex rate function, and there exists 5 > such that I (—5) = 0, so that the RRW is stable. 

Note that as I is a rate function, it is lower semi-continuous. The maximal value for which 
it is finite is denoted by 

f : = sup{r : 7(r) < oo}. (8) 

Suppose that I is a non-coercive function: f < oo and lim r ^ J(r) < oo. Then the limit must 
coincide with 1(f), which is thus finite. This is needed to ensure the existence of optimal 
paths. Note also that / being non-coercive is mutually exclusive with 0^ < oo, which requires 
I(r) < oo for all positive r. 

Theorem 4. An optimal solution to the optimization problem ([6]) exists, and any optimal 
solution ip* satisfies the following properties: There exists < To < T\ < 1 such that, 

(i) ip*(t) > on the open interval (Tq,T\) and ip*(t) = for t £ [0, 1] \ [To,Ti]; 

(ii) ip* is concave on [Tq,Ti]; 

(iii) tp* is continuous on (Tq,T\], with a possible jump at t = Tq. 

The proofs of this theorem and the two that follow are postponed to the end of this section. 

The time To is taken to be the minimal time that a path is non-zero and T\ the maximal 
time that it is non-zero: 

T := inf{i > : ip(t) > 0} and T x := sup{i < 1 : ip(t) > 0}. (9) 

If < f < oo, then we define 

T ° := sup{t > : |>(t) = f}. (10) 

If the supremum is over an empty set then we take Tq = To; hence the inclusion Tq G [To, T\] 
follows from the definitions. The following theorem identifies the structure of the most likely 
path for t between Tq and T\. The one that follows it identifies how most likely paths must 
end. 

Theorem 5. Let < T < T ° < T\ < 1 denote the values given in (JSJ and ([10]) . Then for 
ip* to be an optimal path, there must exist constants b € K and A* > such that 

VJ(V-*(i)) = b-X*t for all T ° <t<T x . (11) 

In particular, if I is coercive, then Tq = Tq and equation (TlT]) is satisfied for all t such that 
ip*(t) > 0. 
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Theorem 6. Suppose that T ° < T\. Then ^ ip*(t)\ t=Tl = -5 and b = A*T in equation 

(HU). 

As well as providing insight into the structure of the most likely paths, Theorems 01 [5] and [6] 
enable the reduction of the problem ([6]) from an infinite dimensional optimization to a finite 
dimensional optimization problem that can be readily solved numerically, if not analytically. 

Proposition 7. Given z > 0, define the subset S z C L + [0, 1] of potential solutions to be the 
collection of functions ip° such that, for some < T < Ti < 1 and T ° G [T , TJ: 

(i) r{t)=0 forte [0,T )U(Ti,l]; 

(ii) i/Ti < 1, then ^°(Ti) = 0; 

(iii) Qr{t)dt = z. 

(iv) i/T ° < Ti, t/ien f~r(t)\t= Tl = -6. 

(v) VI(r(t)) = A*(Ti - t) /or aM f G C^T). 

/// is coercive and 9^ = oo, then in addition to (i)-(v): 

• T ° = T ; 

• '0° has no discontinuities. 

If I is coercive and 9^ < oo, then in addition to (i)-(v): 

• T ° = T ; 

• if ip° has a discontinuity, it is at Tq. 

If I is non-coercive (which ensures that 9^ = oo), then in addition to (i)-(v): 

. i/j°(t) = f for t G [To, Tq); 

• ip° has no discontinuities. 

The problem ([6|) is then equivalent to 



minimize I(f)(T ° - T ) + / I{tp°(s))dt + ip°^{T Q )e^ 
subject to Tp° G S z . 



(12) 
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After proving Theorems 01 [5] and [6l in Section 0] we will use the reduced representation of the 
problem defined in equation (|12p in the consideration of illustrative examples. 

The following lemma will be used to establish properties of an optimal fluid trajectory. 
Lemma 8. Suppose that ip° is a fluid trajectory satisfying z(0) := Jq ijj°(s) ds < oo. 

(i) For any d>0 and t G [0, 1] define tp d {t) = max(0, ip°(t) - d), and z(d) : = ij/(s) ds. 
Then z(-) is convex and non-increasing as a function of d. 

(ii) //V ot (l) = then J{^ d ) is non-increasing as a function of d. 

Proof. For each t, the function of d given by ip d {t) = max(0, ip°(t) — d) is convex and 
non-increasing. It follows that its integral over time is also concave and non-increasing. 

Part (ii) then follows from the definition of J given in equation ([7]). □ 



Proof of Theorem 2J . We first establish the existence of an optimizer, which follows from 
topological arguments. The objective function J : D[0, 1] i— > [0, 1] is defined for elements of 
L + [0, 1] in equation ([7]); set J(^) = +oo for ip L + [0, 1]. The function J is lower semi- 
continuous and has compact level sets as it is the good rate function for the LDP of the 
sample path process {w n }. With domain D[0, 1], the mapping ip i— > ip(s) ds is continuous 
(e.g. {1(3, Theorem 11.5.1]), so that the set {ip 6 D[0, 1] : f 1 ip(s)ds = z} is closed. In a 
Hausdorff space, the infimum of a lower semi-continuous function with compact level sets 
is attained on closed sets (e.g. [171 . Lemma 4.1]). Thus if the infimum in ([6]) is finite, it is 
attained at some ip* € L + [0, 1] such that J* ip*(s) ds = z. 

Regarding the properties of an optimizer ip*, note first that it is obvious that ^(1) = 0: By 
removing downward jumps we reduce J(V0> while increasing the area ip(s) ds. On letting 
ip° denote the new trajectory, and setting z(d) := fQip d (s)ds, Lemma [8] then implies that 
z{d) = z for some d > 0, with J{ij) d ) < J(tp). 

We assume without loss of generality that the closure of {t : ifj(t) > 0} is equal to the interval 
[To, Ti] (where the endpoints are defined in ([9])): If there exist times to < h satisfying to > Tq, 
t\ < Ti, and ip(t) = for t G (to,ti), then the trajectory can be shifted as follows, 



1>(t) t£[T ,t ] 
i>(t + (h-t )) t € [to, 1 - (h - 1 )} 

[max(0, V(l) + (1 - (ti - to) - t)5) t € (1 - (ti - to), !)• 



Once again, on setting z(d) := Jq i/j d (s) ds, we have z(0) > z, and on applying Lemma [8] we 
have z(d) = z for some d > 0, with J(ip d ) < J{ip)- 
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Next, we assume without loss of generality that To = 0: We can replace tfi by, 



iP(t -T ) t€ [0, 1 - T ] 

max(0, ^(1 - T ) + (1 - T - t)S) t E (1 - T , 1) 



An application of Lemma [8] again shows that J(ip d ) < J(ip°) = J{^)i an d z(d) = z = 
Jq 1 ^>(i) dt for some d > 0. 

We can now prove (hi): Figure [T] illustrates why a jump following time To cannot be optimal. 
A formal proof can be performed through construction as in the previous steps. We define, 
for any ij;, the new trajectory with ■0°(O) = ^(0), and 

^)°(t) = '0 t (l) + ^(t), 0<i<l. 

We have J(ifj ) = J(ip), and if)°(s) ds > f if)(s) ds. Applying Lemma[8]we have z{d) = z 
for some d > 0, with J{i]j d ) < J(ip). This proves (hi). 



Area" > Area 




Figure 1: A jump for t > To cannot be optimal. The rate functional evaluated at the two paths tp and ip is 
the same, yet the area is greater using tjj . 

Similar reasoning establishes concavity of an optimal path — Figure [2] shows a transformation 
of a given trajectory to form a new trajectory with reduced value J^ ), but strictly greater 
area. Applying Lemma [8] we obtain (ii). Part (i) then follows from (ii). □ 




Figure 2: An optimal path is concave on (Tb,Ti). Convexity of I(r) implies that J(ip°) < J(ip), yet the area 
is greater using ip° . 
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Proof of Theorem [5j . From Theorem HJ if iff is an optimal solution for (|6|) then it is 
continuous apart from at To; if T\ < 1, then if>*(Ti) = 0. Thus ([6]) can be considered as 
identifying 



inf inf inf J (T ° - T )J(f) + I(ij>(t))dt + 0ty(T o ) (1 
: (T ° - T ) 2 ^ + (Ti - T )^(To) + jT' ^(t)di = 2, ^(Ti) = if Ti < 1 1 , 



For fixed To, ifi(To), T\ and Tq, we are left to consider finding the solution of a problem of 
the following kind: 



minimize / I(ip(t))dt 
Jo 

subject to / ip(s)ds = z'. 
Jo 
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If ip is feasible path, then integration by parts gives 

cT 



ti>{t)dt = Tip{T)-z'. (14) 



o 



Introduce the Lagrangian 



C(i/),X) = J I(ij)(t))dt + \(J tip(t) dt + z! - Tij>(T) 

There exists A = A* so that complementary slackness holds. Hence the optimizer tp* of ([6]) 
also minimizes £(ip,\*) over all if). The constant A* exists by [27|, (Theorem 1 of Section 
8.3)], which only requires feasibility of (|14p for z" in a neighborhood of z' (which is true 
when Tq < T\. If Tq = T% then there is nothing to prove). 

If ip* minimizes the Lagrangian, and if 5 represents a perturbation satisfying 5(t) = for 
te (T °,T!) C , then, 

d rT 



= — C(iP* + 95,X*) = / [VI(ifj(t))+X*t]5(t)dt 
da o=o Jo 

It follows that there exists a constant b such that 

VI(ip(t)) =b-X*t for a.e. te (T °,Ti). 

Returning to problem (j 13[) . this implies that irrespective of the optimal values of To, ^(To), 
T ° or Tl, the optimal path satisfies VI(ip*(t)) = b - X*t for t € (T °, T x ). □ 
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Proof of Theorem 0. Figured illustrates the idea of the proof in the special case T\ < 1. 
Let 5 = — V(*)k=Ti, and suppose that 5 < 5. We will construct a new trajectory ip° with 
a lower value of J(ip°), and increased area. An application of Lemma [8] will then show that 
ip cannot be optimal. 

We henceforth assume without loss of generality that To = 0. Concavity of V on [0,7i] 
implies that 4ffp(t) > —5 for all t £ (To, Ti). The main idea of the proof (as illustrated in the 
figure) is as follows: For given e > 0, the cost contribution to J(ip) over [T\ — e, Ti] is greater 
than el(— 5) = 0(e) if 5 < 5. However, the additional area obtained is bounded by 0(e 2 ). 

We let — <5° denote the right derivative, following a possible jump: 

dt'' 



5° = -lhn^(t) 
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For fixed e > 0, b > 0, let to = be 2 , and let tp° denote the concave function defined by 
V ,0 (0+) = '0(0+), with derivatives for t > defined as follows: 



S° t€(0,to] 
dt' 1 

-5 t>Ti + t -e (provided ip°(t) > 0). 



d ^(t-t ) t€ (t ,Ti+t -£] (15) 

Or 



For 6 sufficiently large, we have J i/j°(s) ds > J ^(s) (is for all e > sufficiently small. For 
the same constant b we also have J( - 0°) < <AV0 — O(e) + 0(e 2 ), so that J(ip°) < «/('^/ , ) for 
sufficiently small e > 0. Fixing b and e so that these bounds hold, we then apply Lemma [8] to 
conclude that z(d) = z for some d > 0, with J(ijj d ) < J(ip°) < J{^)- The second statement 
of the theorem follows from 4r ip* (t) \t=Ti = — $ an d Theorem [5] on noting that X7I(—5) = 0. 

□ 



4. Examples 

^.-Z. Coercive rate function: continuous paths 

We present two examples with coercive rate functions. One is the RRW with Gaussian 
increments. Here the rate function and most likely path can be determined explicitly. The 
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RRW in the second example corresponds to the queue-length at departures of an M/D/l 
queue with batch services. In this case, identification of the rate function and the most likely 
paths requires the solution of two transcendental equations, which can be readily obtained 
numerically. 

Gaussian increments. Let X be i.i.d. Gaussian, with Xq having mean —5 < and variance 
a 2 . Then the conditions of Mogul'skii's Theorem are met with 0-+ = 0' = +oo and the local 
rate function is 

I{ x ) = ± i { x + 5f. 

As 0-^ = 0^ = +oo the sample path rate function Ix is only finite at absolutely continuous 
functions. 

Without loss of generality, assume that Tq = and define T = T\ (so that T represents 
T\ — To). By Proposition [3 to solve the problem (fT2j) for a given z > 0, for each T £ (0, 1] 
we first identify paths ip° that satisfy VI(ip°(t)) = X*(T-t) in [0,T] and ip°(T) = -5, which 
leads to candidate solutions satisfying 

i{,°(t) = a 2 X*(T-t)-5. 

If T < 1, then in addition we have that ip°(T) = and J^ip°(t) dt = z giving a 2 X* = 25 /T 
and T = W6zJS. Note that T < 1 only if Qz < 5 and therefore the optimal path is 

l~Tt 2 

^*(t) = 5t- S\ for t £ [0, 21 if z < 5/6. (16) 

V oz 2 

If T = 1, then we have that c := ifj°(l) > and $ i/)°(t) dt = z, giving a 2 X* = 2(c + 5) and 
c = 3/2 (z — 5/6). Note that c > only if 6z > 5 and therefore the optimal path is 

ip*(t) = 3 (^z + (t - M - St for t G [0, 1] if z > 5/6. (17) 

Evaluating J Q I \ip* (t)) dt , for defined in equations (fT6|) and ([T7]) . we obtain the rate 

function Iyy(z) presented in equation ([2]), Proposition [21 which can be found in Section [TJ 

With 5 = 1 and a 2 = 1, the rate function defined in equation ([5]) is plotted on the left in 
Figure HI It is concave for z < 1/6 and then convex for z > 1/6. Thus it is not possible that 
this LDP could be proved, or its rate function identified, by Gartner-Ellis style methods that 
rely on convexity. The transition at z = 1/6 occurs when the most likely paths change from 
returning to within the interval to paths that end with a non-zero position. This explains 
the dramatic change in shape of the rate function at that point. 

Two most likely fluid RRW paths given that W n ~ nz for are shown on the right in Figure HI 
The higher path has z = 1/3, while the lower path has z = 1/7. Note that, for the lower 
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Figure 4: Coercive rate function example: i.i.d. Gaussian increments. On the left hand side is lyy(z) versus 
z. Shown on the right hand side are most likely RRW paths, ip"(t) » n _1 W[„ t j, given that W„ fti nz. 



path, all paths of this shape that start in the interval [0, 1 — y/6z\ are most likely paths to 

the deviation. That is, there is no single most likely path, just a most likely shape that can 
occur anywhere within in [0,1]. 

M/D/l queue- lengths. A second coercive example, albeit one that requires numerics for 
its ultimate solution, is when X is i.i.d., with Xq having Poisson distribution with rate a and 
mean -6 = a - fj, {n € N), P{X Q = k} = e~ a a k+ ^/(k + fi)\ for k = -fj,, -fj, + l,.... In this 
setting, the RRW defined in equation ([1]) corresponds to the queue-length of an M/D/l queue 
with batch fi services, where the queue-length is observed at customer departures. That is, 
at each service the minimum of the current queue-length and fj, customers are processed in a 
single batch. Between services, a Poisson(a) number of customers arrive to the queue. 

The conditions of Mogul'skii's Theorem are met with 6^ = 9^ = +oo and the local rate 



and I(x) = oo if x < —fi. The sample path rate function Ix is only finite at absolutely 
continuous functions. 

Again, without loss of generality assume that Tq = and define T = T\. For a given z > 0, 



for each T G (0, 1] we first identify paths ip° that satisfy VI(^°(t)) = A*(T - t) in [0,T] and 
ip°(T) = —5, which leads to candidate optimal paths of the following form: 



If T < 1 , using the constraint ip° (T) = gives the following equation for A* in terms of T 




function is 




Integrating, we have that 
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If T = 1 and c:= we have the following equation for A* in terms of c: 

£( e *-i)- M - c = o. 

Both of these are transcendental equations, but can be readily solved numerically for A*. 
Once A* is known, the constraint 

gives a transcendental equation for T or c (= tfj°(l)) in terms of z and identifies the solution 
ip* which determines the rate function 

I W (z) = [ dt = (a- y)T - ip(T) + £ ( e >^(A*T - 1) + l) . 




Figure 5: Coercive rate function example: M/D/l queue-lengths, Poisson increments. On the left hand side 
is ijy(z) versus z. Shown on the right hand side are most likely RRW paths, <f>*(t) w n~ 1 W^ nt \, given that 
W n ~ nz. 

With a = 0.5 and [i = 1.0, the numerically calculated rate function is plotted on the left in 
Figure [5l The transition from concave to convex again occurs when the most likely path has 
both T = 1 and c:=ip°(l) = 0. Two example most likely paths are plotted on the right hand 
side of Figure El which display similar features to the most likely paths as for the Gaussian 
increments RRW. 

4-2. Coercive rate function: paths with jumps 

D/M/l waiting times. Let X be i.i.d. with P(Xq > x) = exp(— a(x + /u -1 )) for x > 
— fi^ 1 giving E(Xo) = a -1 — := —S. We assume that \i < a so that 5 > 0. Then 
the RRW defined in equation ([T]) corresponds to waiting times at a stable D/M/l queue, 
where customers arrive at regular intervals of length and experience i.i.d. exponentially 
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distributed service times with mean a . Cramer's Theorem holds for {rc n (l)} with rate 
function 



I(x) 



a H — ~\ — log ^x H — ~\ ^ — 1 if x € (— /i 1 , oo) 
oo otherwise, 



which is coercive, so that Tg = To for the optimal path. However, 9^ = a so that the 
possibility of an initial jump in the most likely path cannot be discounted. 

Again, without loss of generality, assume that Tq = and define T = T\. Using X7I(ip°(t)) = 
A*(T — t) and the constraint ip°(T) = —5 = 1/a — 1/fJ,, candidate solutions must satisfy 

&® = r^77 t\ ~ ~ for a11 1 G (0 ' T) 

a + \*{t — T) fi 
and hence, for some initial jump ip°(0) = a > 0, 

If T < 1, then in addition we have that ip°{T) = 0, which gives the following equation: 

a — \*T — aexp (A* (a J J = 0. 



i 1 . 

If T = 1, then we have that c := ip°(T) > 0, which implies that 

a — A* — a exp ^A* — c ^ ~\ = 0. 

Treating a (= ^°(0)) as given, these two transcendental equations can be readily solved for 
A*. Finally we have the constraint that J Q ip°{t) = z, so that 

nn ( 1 \ a ( a \ T 2 
z = T I a ~ — I + lo S 

and 

aa + £ I(r(t)) dt = a(a + </>°(T) + ^ - 2T + (^^) log (^^) • 

For given a > 0, having solved the transcendental equations, the most likely path and its 
associated rate can be calculated. Optimization over a can then be performed numerically. 

For example, with a = 2 and jx = 1, the rate function is plotted on the left hand side of 
Figure [H It looks similar to the earlier examples, but is asymptotically linear with slope 



A* / (A*) 2 b \a-\*Tj 2/i 
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Figure 6: Coercive rate function with jumps example: D/M/l waiting times, i.i.d. Exponential increments. 
On the left hand side is Iyy(z) versus z. Shown on the right hand side are most likely RRW paths, <j>*(t) « 
n~ 1 W\nt], given that W n ~ nz. 



a. The reason for this is best explained by considering the most likely path shown on the 
right hand side of Figure [6j For small z, ip°(0) = a = and no jump occurs at the start 
of the most likely path. However, once z is sufficiently large (approximately 1.67 for these 
parameters), the most likely path has a jump at followed by a vertically shifted version of 
the largest-area most likely path that doesn't have a jump, as illustrated in Figure El The 
increase in the rate function for the shift of height ip°(0) = a (gaining area a over the interval 
[0, 1]), is aa, which is why the rate function is ultimately linear with slope a. 



4-3. Non-coercive rate function: rate- constrained paths 

M/M/l queue-length. Let X be a Bernoulli sequence taking values —1 and +1 with 
a = P{X = +1} < P{X Q = — 1} = 1 - a. The RRW in equation (P) corresponds to the 
queue-length of an M/M/l queue observed at arrivals and departures. We have 6^ = 9^ = 
+oo. The increments rate function / is infinite outside [—1,1] and is non-coercive with f = 1: 

= ^ log + log (2^0) and f = L 



Note that I w {z) = +00 if z > 1/2; if z = 1/2, then T ° = T = 1 and tp*(t) = t so that 
= — log^)- Without loss of generality, let To = and define T = T\. Assume that 
Tg < T. The equation VI(ip (t)) = X*(T — t) with the boundary condition i/j°(T) = —5 = 
2a — 1 gives 

1 ]£te[o,i$] 

r(t) = { aexp(2A*(r -£))-(! -a) 

aexp(2A*(T-t)) + l-Q ute l J o^J- 
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By integrating, we are looking at proposed solutions 
't 



2T ° 



1 / ae 2A*(T-TQ) + 1 

t+ A* g I ae^V-t) +l-a 



a 



if t€ [0,25] 
Xte(l$,T\. 



If T < 1, then </>°(T) = gives 

ae 2A*(T-T 0) _ e A*(T-2T 0) + 1 _ a = Q 

and, in particular, if T ° = 0, then A* = log((l-a)/a)/T. While if T = 1, then c:=V°(2) > 
(c < 1) gives the following equation for A* 



ae 



2A*(l-r °)_ e A*( C+ l-2T0) + 1 _ a = a 



(18) 



Note that this equation only has a positive solution for c G (max{0, 2a — 1 + 2(1 — a)Tg }, 1). 
The lower bound embodies the fact that the optimal path has ip* (t) > 2a — 1 and therefore 
c = ^*(1) = T ° + Jyo ^*(t) (it > 2a-l + 2(l-a)T °. Once A* is identified, one can numerically 
evaluate the integral 



*jj°(t)dt. 



For example, with a = 1/3, Figure [7] plots the numerically evaluated rate function I^(z) 
versus z. The initial shape of the rate function is similar to y/x, but, as can be seen clearly 
in the graph, once z is sufficiently large that the optimal path has c > 0, the rate function 
increases dramatically. 



0.055 
z - 0.494 



!w( z ) 



0.2 0.4 



Figure 7: Non-coercive rate function example: M/M/l queue- lengths, Bernoulli increments. On the left hand 
side is %r(z) versus z. Shown on the right hand side are most likely RRW paths, <f>*(t) w n _1 M / [ nt j, given 
that W n fa nz. 

For z < 0.0682, the optimal value of T\ is less than 1 and the numerically-identified most 
likely path occurs with T ° = 0. For z > 0.0682, T = 1 and the optimal path also has T ° = 
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apart from, possibly, as z —¥ 1/2. The reason for this caveat is that the cost of a path with 
Tg = becomes numerically indistinguishable from those with Tq > if z ~ 1/2. To see this, 
note that as z f 1/2, c f 1, so that if Tq = 0, then A*, the solution of equation (fTHj) . tends 
to 1 and the most likely path to large simulated mean has slope close to 1 for a substantial 
range of t. This is illustrated in the higher paths on the right in Figure [7] corresponding to 
the most likely path for the deviation z ~ 0.45. For this path A* ~ 0.941. Note that the 
slope is nearly 1 until near t = 0.85, even though technically Tq = 0. 



4-4- Simulations 

One of the strong deductions of these sample-path arguments is the prediction of the most 
likely path that gives rise to a large simulated mean. To illustrate the merits of these predic- 
tions we conducted simulations of the RRW in two settings: Gaussian i.i.d. increments, and 
the M/M/l queue example introduced in Section [4.31 

In each case, the RRW was simulated for a fixed number of steps n > 1, and the simulation 
was repeated 2 x 10 8 times. Of these simulated RRWs, the one with the largest simulated 
mean was recorded and compared with the theory laid out in Section HI This theory predicts 
the approximation Wm ~ n<J)*(t/n) for t £ [0, n]. The results from two experiments are 
illustrated in Figure EJ 



Theory: 




xxxxx Observed: W(t) * * 








• ** x X 


****"> 


** 





Theory: i>*(t) 
Observed: W(t) 



Figure 8: RRW with i.i.d. increments. The figure shown on the left hand side shows experiments obtained 
with Gaussian increments. On the right hand side are results obtained for the M/M/l queue model in which 
the increments take on values ±1. In each case, the observed path has the largest simulated mean out of 
2 x 10 8 sampled paths. Also, shown in each figure is the corresponding theoretical prediction of the most likely 
path, given the observed simulated mean. 

In the first experiment illustrated on the left hand side, the increments of the random walk 
were taken to be i.i.d. Gaussian with 5 = 0.5, a 2 = 1 and the time-horizon n = 50. The 
second experiment used the M/M/l queue example found in Section [4.31 with a = 0.3 and 
n = 40. In each experiment, the observed sample path is plotted along with the theoretical 
prediction corresponding to the observed simulated mean. The theory's quantitative power 
in predicting the shape of the most likely path is apparent in these figures. 
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5. Discussion 



As a final remark, we have mentioned that our fundamental hypothesis, Assumption en- 
compasses the light tailed setting in the absence of long range dependence. However, by 
changing the speed of the LDP, this assumption also holds for certain long range dependent 
processes. For example, in continuous time it is known, e.g. [9] [35] [29], that fractional Brow- 
nian Motion (fBM) with Hurst parameter H satisfies the LDP at speed 

„2(1-H) in £)[(), 1]. 

As the nature of the speed does not enter the proof of Theorem [31 the first conclusion of that 
theorem holds for these processes. However, even for the canonical example of fBM the rate 
function is not of the integral form in equation ([3D and thus, in the long range dependent 
setting, it is hard to deduce if any general properties exist for the most likely paths to a large 
simulated mean. 
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